Assignment Projecti Examplelp

Pi https://eduassistpro.gil Nub.io/

Add WeChat edu\_assist\_pro

COMP 273 Pipelines (1)

Based on Patterson's 61C

## Review (1/3)

 Datapath is the hardware that performs operations necessary to execute programs.

Assignment Project Exam Help

Control instructs dat o next.

https://eduassistpro.github.io/

- Datapath needs:
  - access to storage (general purpos
     access to storage (general purpos
  - computational ability (ALU)
  - helper hardware (local registers and PC)

COMP 273 Pipelines (2)

Based on Patterson's 610

## Review (2/3)

- Five stages of datapath (executing an instruction):
  - 1. Instruction Fetch (Increment PC)
    Assignment Project Exam Help
    2. Instruction s)

  - 3. ALU (Compuhttps://eduassistpro.github.io/
  - 4. Memory Accestd WeChat edu\_assist\_pro
  - 5. Write to Registers
- ALL instructions must go through ALL five stages.

#### Review Datapath



COMP 273 Pipelines (4)

Based on Patterson's 61C

#### Outline

- Pipelining Analogy
- Pipelining Instruction Execution Assignment Project Exam Help
- Hazards

https://eduassistpro.github.io/

Add WeChat edu\_assist\_pro

COMP 273 Pipelines (5)

Based on Patterson's 61C

Assignment Project Exam Help

https://eduassistpro.github.io/

Add WeChat edu\_assist\_pro







# With or without your laundry

 Bono, The Edge, Adam Clayton, and Larry Mullen Jr., each have one load of clothes to wash, dry,



fold, and passing many ent Project Exam Help

https://eduassistpro\_\_\_hub.io/

Washer takes 30 minutes

Add WeChat edu\_assst\_pro

**Dryer takes 30 minutes** 

- "Folder" takes 30 minutes
- "Stasher" takes 30 minutes to put clothes into drawers





# Sequential Laundry



Sequential laundry takes 8 hours for 4 loads

COMP 273 Pipelines (8)

Based on Patterson's 61

# Pipelined Laundry



COMP 273 Pipelines (9) Based on Patterson's 61C

#### **General Definitions**

- Latency
  - Time to completely execute a certain task
     Assignment Project Exam Help
     For example, time to read a sector from disk is disk access time or disk latency
- Throughput

https://eduassistpro.github.io/

- Amount of work that Add Medding edu\_assistopiof time

## Pipelining Lessons (1/2)



COMP 273 Pipelines (11)

Based on Patterson's 61

## Pipelining Lessons (2/2)



COMP 273 Pipelines (12)

Based on Patterson's 61

## Steps in Executing MIPS

1) IFetch: Fetch Instruction, Increment PC

Instruction, Read Registers Assignment Project Exam Help 2) Decode:

3) Execute:

Mem-ref: Calculat https://eduassistpro.github.io/

Arith-log: Perform Operation edu\_assist\_pro

4) Memory:

Load: Read Data from Memory

Store: Write Data to Memory

5) Write Back: Write Data to Register

## Pipelined Execution Representation



 Every instruction must take same number of steps, also called pipeline "stages", so some will go idle sometimes

COMP 273 Pipelines (14)

Based on Patterson's 61

Review: Datapath for MIPS



Use datapath figure to represent pipeline



COMP 273 Pipelines (15)

Based on Patterson's 61

### **Graphical Pipeline Representation**

(In Reg, right half highlight read, left half write)
Time (clock cycles)



COMP 273 Pipelines (16)

Based on Patterson's 61

## Example

- Suppose 2 ns for memory access, 2 ns for ALU operation, and 1 ns for register file read or write; what is the *instruction* execution rate? Assignment Project Exam Help
- Non-pipelined Exec https://eduassistpro.githlub.lbp:

```
LW: IF + Read Reg + ALU + Memory +
= 2 + 1 + 2 + 2 + 1 = 8 ns

ADD: IF + Read Reg + ALU + Write Reg
= 2 + 1 + 2 + 1 = 6 ns
```

Pipelined Execution:

Max(IF,Read Reg,ALU,Memory,Write Reg) = 2 ns

COMP 273 Pipelines (17)

Based on Patterson's 61

## Problems for Pipelined Processors

Structural hazards:

 instructions
 https://eduassistpro.github.io/

- Control hazards: Pipelining of that edu\_assisherinstructions can stall the pipeline until the hazard
- Data hazards: Instruction depends on result of prior instruction still in the pipeline

COMP 273 Pipelines (18)

Based on Patterson's 610

#### Structural Hazard #1: Single Memory (1/2)



Read same memory twice in same clock cycle

COMP 273 Pipelines (19)

Based on Patterson's 61

#### Structural Hazard #1: Single Memory (2/2)

#### • Solution:

- Infeasible and inefficient to create an independent second memory, Assignment Project Exam Help but can simulate this by having two Level 1 Caches
- Use an L1 Instructio https://eduassistpro.githubbie/
- Need more complex hardware that edu\_assisenphoth caches miss

COMP 273 Pipelines (20)

Based on Patterson's 610

#### Structural Hazard #2: Registers (1/2)



Can't read and write to registers simultaneously

COMP 273 Pipelines (21)

Based on Patterson's 61

## Structural Hazard #2: Registers (2/2)

- Fact: Register access is VERY fast: takes less than half the time of ALU stage
   Assignment Project Exam Help
- Solution: modify clo
  - always Write to Regi
     https://eduassistpro.github.jo/ ch clock cycle
  - always **Read** from Registers durin edu\_assist propeach clock cycle
  - Result: can perform Read and Write during same clock cycle

COMP 273 Pipelines (22)

Based on Patterson's 610

## Control Hazard: Branching (1/7)



Where do we do the compare for the branch?

COMP 273 Pipelines (23)

Based on Patterson's 61

## Control Hazard: Branching (2/7)

- We naively put branch decision-making hardware in ALU stage
  - therefore two more instructions after the branch will always be fetched, whether or not the branch is taken
- Consider: Desired fu https://eduassistpro.github.io/
  - if we do not take the the the the domain the domain the domain that the domain the domain that th
  - if we take the branch, don't execute any instructions after the branch, just go to the desired label

COMP 273 Pipelines (24)

Based on Patterson's 61

## Control Hazard: Branching (3/7)

- Initial Solution: **Stall** until decision is made
  - insert "no-op" instructions that accomplish nothing, just take time Assignment Project Exam Help
     Drawback: branches
  - Drawback: branches ch (assuming comparator is put in ALU stage)
     https://eduassistpro.github.io/

Add WeChat edu\_assist\_pro

COMP 273 Pipelines (25)

Based on Patterson's 61C

## Control Hazard: Branching (4/7)

#### • Optimization #1:

Move asynchronous comparator up to Stage 2
 Assignment Project Exam Help
 As soon as instructio ident

- As soon as instructio identifies is as a branch), immediately make a https://eduassistpro.githueboiothe PC (if necessary)
   Add WeChat edu\_assist\_pro
- Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched, so only one no-op is needed
- Side Note: This means that branches are idle in Stages 3, 4 and 5.

COMP 273 Pipelines (26)

Based on Patterson's 610

## Control Hazard: Branching (5/7)

Insert a single no-op (bubble)



Impact: 2 clock cycles per branch instruction, slow

COMP 273 Pipelines (27)

Based on Patterson's 61

## Control Hazard: Branching (6/7)

- Optimization #2: Redefine branches
  - Old definition: if we take the branch, none of the instructions after the branch get executed by accident
  - New definition: whe https://eduassistpro.gethubrich, the single instruction immediately following edu\_assist\_pro executed
  - This is called the branch-delay slot

COMP 273 Pipelines (28)

Based on Patterson's 61C

## Control Hazard: Branching (7/7)

- Notes on Branch-Delay Slot
  - Worst-Case Scenario: can always put a no-op in the branch-delay slot Assignment Project Exam Help
     Better Case: can find ing the branch which can
  - Better Case: can find ing the branch which can be placed in the bra https://eduassistpro.gialfietetimg flow of the program
    Add WeChat edu\_assist\_pro
    - Re-ordering instructions is a common method of speeding up programs
    - Compiler must be very smart in order to find instructions to do this
    - Usually can find such an instruction at least 50% of the time
    - Jumps also have a delay slot!

COMP 273 Pipelines (29)

Based on Patterson's 610

#### Example: Nondelayed vs. Delayed Branch

```
Delayed Branch
     Nondelayed Branch
         $8, $9,$10
                            add $1 ,$2,$3
    or
    add $1 ,$2,$3
                            sub $4, $5,$6
            Assignment Project Exam Help
                https://eduas.istpro.github.io/
    sub $4, $
    beq $1, $4AdEWethat edu assist pr$9,$10
    xor $10, $1,$11
                            xor $10, $1,$11
          What can we
          stick in the
          branch delay
Exit:
                        Exit:
          slot?
```

COMP 273 Pipelines (30)

Based on Patterson's 61

## Simulating Hazards

MARS can simulate delayed branches

Assignment Project Exam Help

https://eduassistpro.github.io/

Add WeChat edu\_assist\_pro

COMP 273 Pipelines (31)

Based on Patterson's 61C

#### Data Hazards

#### Dependencies backwards in time are hazards



COMP 273 Pipelines (32)

Based on Patterson's 61

#### Data Hazard Solution: Forwarding

Forward result from one stage to another



"or" hazard solved by register hardware

COMP 273 Pipelines (33)

Based on Patterson's 61

## Data Hazard: Loads (1/4)

Dependencies backwards in time are hazards



- Can't solve with forwarding
- Must stall instruction dependent on load, then forward (more hardware)

COMP 273 Pipelines (34)

Based on Patterson's 61

## Data Hazard: Loads (2/4)

- Hardware must stall pipeline
- Also called "interlock"



COMP 273 Pipelines (35)

Based on Patterson's 61

## Data Hazard: Loads (3/4)

- Instruction slot after a load is called load delay slot
- If that instruction uses the result of the load, then the hardware interlock will stall it signments beginned Exam Help
- If compiler puts an unhttps://eduassistpro.githletislot, then no stall
- Letting the hardware stall the inst edu\_assist\_pro equivalent to putting a nop in the
- Alternatively: could redefine instruction semantics to introduce a delay slot, i.e., after lw (or lb or lbu) instruction, the data isn't actually available in the destination register until an extra instruction later

COMP 273 Pipelines (36)

Based on Patterson's 61

#### Recall: Control Hazard

Decision to branch made during instruction decode, registers can be read in the 2<sup>nd</sup> half of the cycle. Comparison of values very fast with simple XNOR + AND.



COMP 273 Pipelines (37)

Based on Patterson's 61

#### Data-Control Hazard?

Decision to branch might also need a result from the ALU. Can we assume that the branch comparison is fast enough to asynchronously access the ALU result in the ID stage?



COMP 273 Pipelines (38)

Based on Patterson's 61

#### Data-Control Hazard?

Even with the load delay slot, we may need to asynchronously access the load from memory to make the branch decision.



OMP 273 Pipelines (39)

Based on Patterson's 61C

# Optimization (1/3)

 Now that we know what is fast and what is slow, how do we write fast programs?

Assignment Project Exam Help

- As long as we avoid ding pipeline stalls, we maximize instructio https://eduassistpro.github.io/

- First, simplest technique: when the du\_assist\_pro
  - Stalling a cycle or two for a control/data hazard is nothing compared to stalling hundreds of cycles for a cache miss!

COMP 273 Pipelines (40)

Based on Patterson's 610

# Optimization (2/3)

- Instruction reordering:
  - Be aware of delay slots, reorder instructions to put a useful yet Assignment Project Exam Help unrelated instruction in a delay slot.
  - This is a pretty tedio https://eduassistpro.githuelp.ias/semblers do a good job of doing the dirty week f edu\_assist\_pro

COMP 273 Pipelines (41)

Based on Patterson's 610

### Question

 Assume 1 instr/clock, delayed branch, 5 stage pipeline, forwarding, interlock on unresolved load hazards (after 10<sup>3</sup> loops, so pipeline full) Assignment Project Exam Help



 How many pipeline stages (clock cycles) per loop iteration to execute this code?

COMP 273 Pipelines (42)

Based on Patterson's 61

#### Answer

 Assume 1 instr/clock, delayed branch, 5 stage pipeline, forwarding, interlock on unresolved load hazards (after 10<sup>3</sup> loops, so pipeline full) Assignment Project Exam Help

 How many pipeline stages (clock cycles) per loop iteration to execute this code?

COMP 273 Pipelines (43)

Based on Patterson's 61

## Optimization Question

- Instruction Reordering Example
  - The loop takes 7 clock cycles per iteration.

```
Can you improve it Project Exam Help
```

```
Loop: l https://eduassistpro.github.io/n https://eduassistpro.github.io/addid W$Cbat edu_assist_2pro sw $t0, 0($s1) addiu $s1, $s1, -4 bne $s1, $s3, Loop nop
```

COMP 273 Pipelines (44)

Based on Patterson's 61

## Intruction Reordering Solution

 Reordering can reduce the number of cycles to 5 per iteration:

Assignment Project Exam Help

```
Loop: 1
a https://eduassistpro.github.io/
addid \to\to\hat edu_assis2_pro
bne $$1, $$3, loop
sw $$t0, 4($$1)
```

COMP 273 Pipelines (45)

Based on Patterson's 61

# Optimization (3/3): Loop Unrolling

- Branches are difficult, just avoid them if possible.
- For a loop with a fixed number of iterations, write the assembly code by ju of the loop out repeatedly rather the https://eduassistpro.githyla.io/hes.
- Tradeoff: *more total code* but be edu\_assisted overall and fewer branch instructions  $\rightarrow$  means faster execution time.
- Can also partially unroll large loops!

COMP 273 Pipelines (46)

Based on Patterson's 61

```
addi $s0, $0, $0  # $s0 = 0
loop: lw $t0, 0($s1)
        add $s2, $signment Project Exam Help longer, it would be useful to eliminate the counter of and its
                                       eliminate the counter s0 and instead keep track
                                              et end address s1+4n and eliminate
        addi $s1, $s https://eduassistpro.githi
        addi $s0, $s0AddWeChatedu_assist_pro
        slt $t1, $s0, 4 # if $s0 < 4
        bne $t1, $0, loop # then loop
```

What does this code do



```
lw ($t0) 0 ($s1)
add $s2, $s2, $t0
                Assignment Project Exam Help
lw $t0, 4($s1)
add $s2, $s2, $t0 https://eduassistpro.github.io/
lw $t0, 8($s1) Add WeChat edu_assist_pro
add $s2, $s2, $t0
lw $t0, 12($s1)
add $s2, $s2, $t0
```

COMP 273 Pipelines (48)

Based on Patterson's 61

```
lw $t0, 0($s1)
lw $t1, 4($s1)
lw $t2, 8($s1) Assignment Project Exam Help
lw $t3, 12 ($s1) https://eduassistpro.githubsio/
add $s2, $s2, $t0 Add WeChat edu_assist_pro
add $s2, $s2, $t1
add $s2, $s2, $t2
add $s2, $s2, $t3
```

COMP 273 Pipelines (49)

Based on Patterson's 61

```
lw $t0, 0($s1)
  lw $t1, 4($s1)
add $s2, $s2, $\frac{Assignment Project Exam Help}{\frac{Assignment Project Exam Help}{\frac{Assignmen
 add $s2, $s2, $t1 https://eduassistpro.githubsio/
  lw $t0, 8($s1) Add WeChat edu_assist_pro
  lw $t1, 12($s1)
  add $s2, $s2, $t0
  add $s2, $s2, $t1
```

COMP 273 Pipelines (50)

Based on Patterson's 61

#### **Branch Prediction**

 Modern processors with long pipelines also use a technique called branch prediction.

Assignment Project Exam Help

• As soon as any bran hed, make an instantaneous guess https://eduassistpro.githlahieh will be taken or not and keep executiaga weathin edu\_assistspooright.

 If we guess wrong, undo everything that we shouldn't have done and start over at the correct place.

COMP 273 Pipelines (51)

Based on Patterson's 610

#### **Branch Prediction**

- A naïve branch prediction algorithm:
  - If the branch target is backward, guess that it will be taken. If the target is forward, guess that it will not be taken.
  - What's the rationale https://eduassistpro.github.io/
- More elaborate algorithms actual edu\_assish pwo often each branch instruction is taken and modify their behavior — called dynamic branch prediction.

COMP 273 Pipelines (52)

Based on Patterson's 61

## The Big Picture

• Although the compiler generally relies on the hardware to resolve hazards and thereby ensure correct execution, the compiler must unde resolve Exam Help to achieve the best performance. Otherhttps://eduassistpro.gtallp.woill reduce the performance of the compiled for edu\_assist\_pro

COMP 273 Pipelines (53)

Based on Patterson's 610

### Questions

- The book uses the excellent "washing and drying clothes" analogy to explain pipelines and the three common hazards. Describe another an Project Exam Help lines are useful and explain what the thrhttps://eduassistpro.giatudomain. Aim for an analogy as far from clothese and edu\_assistspros possible -- think outside the box.
- What requirements did the original MIPS processor place on compilers to avoid the need for hardware to stall the pipeline?

COMP 273 Pipelines (54)

Based on Patterson's 61

### Questions



- A. Thanks to pipelining, I have reduced the time it took me to wash my shirt.
- B. Longer pip \_\_\_\_\_\_win (since less work per stage Welfat edu\_assist\_pro
- C. We can <u>rely on compilers</u> to help us avoid data hazards by reordering instructions.

COMP 273 Pipelines (55)

Based on Patterson's 610

## **Exception Handling**

- If the datapath and control aren't already complex enough, what about exceptions? Assignment Project Exam Help
- On an exception, ne tely to the exception handler https://eduassistpro.github.io/
  - Exceptions really must be precise; edu\_assiststpapbitrarily redefine the behavior to add delay slots.
- Definitely an advanced topic, beyond the scope of this course...

COMP 273 Pipelines (56)

Based on Patterson's 61

### Superscalar Laundry: Parallel per stage

More resources,

HW to match mix of parallel tasks?



COMP 273 Pipelines (59)

Based on Patterson's 610

# Things to Remember (1/2)

- Optimal Pipeline
  - Each stage is executing part of an instruction each clock cycle.
  - One instruction finisies during each Exam Help.
  - On average, execute <a href="https://eduassistpro.github.io/">https://eduassistpro.github.io/</a>
- What makes this work? WeChat edu\_assist\_pro
  - Similarities between instructions allow us to use same stages for all instructions (generally).
  - Each stage takes about the same amount of time as all others: little wasted time.

COMP 273 Pipelines (60)

Based on Patterson's 610

# Things to Remember (2/2)

- Pipelining is a BIG IDEA
  - widely used concept
- What makes it less tisament reciect Exam Help
  - Structural hazards: https://eduassistpro.githeเรลูธุ์he?
    - ⇒ Need more HW resources
  - Control hazards: need to WeChat edu\_assist\_pro h instructions?
    - ⇒ Delayed branch
  - Data hazards: an instruction depends on a previous instruction?
- Advanced techniques: branch prediction, out of order execution, superscalar

COMP 273 Pipelines (61)

Based on Patterson's 61

#### Review and More Information

- Textbook Section 4.5 and 4.6
- Hazards in Sections 4.7 and 4.8 Assignment Project Exam Help

https://eduassistpro.github.io/

Add WeChat edu\_assist\_pro

COMP 273 Pipelines (62)

Based on Patterson's 61C

### Extra Question



- Assume a MIPS machin <a href="https://eduassistpro.githukclie./">https://eduassistpro.githukclie./</a>/delayed branching, a 5 stage pipeline, forwarding, and interlo olved load hazards <a href="https://eduassist\_pro">Add WeChat edu\_assist\_pro</a>
- Respect register conventions
- Use only true assembly language
- Use careful instruction ordering to make a loop that takes the shortest possible number of cycles to complete



COMP 273 Pipelines (63)

Based on Patterson's 61

### Extra Question, Part 2

Partially unroll the memory copy function

```
blockcopy( int[] A, int[] B, int n ) {
    for ( int i = 0Asigniment=Project Exam Help
        A[i] = B[i];
        A[i+1] = B[i+1]
        A[i+2] = B[i+2] https://eduassistpro.github.io/
        A[i+3] = B[i+3];
        Add WeChat edu_assist_pro
}
```



- Same assumptions as before
- Respect register conventions and only use true MIPS
- Use careful instruction ordering to produce a loop that takes the shortest possible number of cycles to complete

COMP 273 Pipelines (64)

Based on Patterson's 61

### Questions

- How much faster is blockcopy than memcopy?
  - Can you make memcopy do 1 word in 5 cycles?
     Assignment Project Exam Help
     Can you make block cycles?

  - If yes, 1.82 times fas https://eduassistpro.github.io/

Add WeChat edu\_assist\_pro

